Extraction of Bilingual Cognates from Wikipedia
نویسندگان
چکیده
In this article, we propose a method to extract translation equivalents with similar spelling from comparable corpora. The method was applied on Wikipedia to extract a large amount of PortugueseSpanish bilingual terminological pairs that were not found in existing dictionaries. The resulting bilingual lexicons consists of more than 27, 000 new pairs of lemmas and multiwords, with about 92% accuracy.
منابع مشابه
Extraction de lexiques bilingues à partir de Wikipédia (Bilingual lexicon extraction from Wikipedia) [in French]
________________________________________________________________________________________________________ Bilingual lexicon extraction from Wikipedia With the increased interest of the machine translation, needs of multilingual resources such as comparable corpora and bilingual lexicon has increased. These resources are not available mainly for pair of languages that do not involve English. This...
متن کاملBilingual Dictionary Extraction from Wikipedia
The way of mining comparable corpora and the strategy of dictionary extraction are two essential elements of bilingual dictionary extraction from comparable corpora. This paper first proposes a method, which uses the interlanguage link in Wikipedia, to build comparable corpora. The large scale of Wikipedia ensures the quantity of collected comparable corpora. Besides, because the inter-language...
متن کاملMeasuring Comparability of Multilingual Corpora Extracted from Wikipedia
Comparable corpora can be used for many linguistic tasks such as bilingual lexicon extraction. By improving the quality of comparable corpora, we improve the quality of the extraction. This article describes some strategies to build comparable corpora from Wikipedia and proposes a measure of comparability. Experiments were performed on Portuguese, Spanish, and English Wikipedia.
متن کاملMeasuring Comparability of Multilingual Corpora Extracted from Wikipedia ∗ Midiendo la comparabilidad de copus multilingües extráıdos de la Wikipedia
Comparable corpora can be used for many linguistic tasks such as bilingual lexicon extraction. By improving the quality of comparable corpora, we improve the quality of the extraction. This article describes some strategies to build comparable corpora from Wikipedia and proposes a measure of comparability. Experiments were performed on Portuguese, Spanish, and English Wikipedia.
متن کاملIterative Bilingual Lexicon Extraction from Comparable Corpora Using Topic Model and Context Based Methods
In the literature, two main categories of methods have been proposed for bilingual lexicon extraction from comparable corpora, namely topic model and context based methods. In this paper, we present a bilingual lexicon extraction system that is based on a novel combination of these two methods in an iterative process. Our system does not rely on any prior knowledge and the performance can be it...
متن کامل